Search CORE

44 research outputs found

Task-specific Word Identification from Short Texts Using a Convolutional Neural Network

Author: Wu Xintao
Xiang Yang
Yuan Shuhan
Publication venue
Publication date: 02/06/2017
Field of study

Task-specific word identification aims to choose the task-related words that best describe a short text. Existing approaches require well-defined seed words or lexical dictionaries (e.g., WordNet), which are often unavailable for many applications such as social discrimination detection and fake review detection. However, we often have a set of labeled short texts where each short text has a task-related class label, e.g., discriminatory or non-discriminatory, specified by users or learned by classification algorithms. In this paper, we focus on identifying task-specific words and phrases from short texts by exploiting their class labels rather than using seed words or lexical dictionaries. We consider the task-specific word and phrase identification as feature learning. We train a convolutional neural network over a set of labeled texts and use score vectors to localize the task-specific words and phrases. Experimental results on sentiment word identification show that our approach significantly outperforms existing methods. We further conduct two case studies to show the effectiveness of our approach. One case study on a crawled tweets dataset demonstrates that our approach can successfully capture the discrimination-related words/phrases. The other case study on fake review detection shows that our approach can identify the fake-review words/phrases.Comment: accepted by Intelligent Data Analysis, an International Journa

arXiv.org e-Print Archive

ScholarWorks@UARK

UARK (University of Arkansas )

SAFE: A Neural Survival Analysis Model for Fraud Early Detection

Author: Wu Xintao
Yuan Shuhan
Zheng Panpan
Publication venue
Publication date: 13/11/2018
Field of study

Many online platforms have deployed anti-fraud systems to detect and prevent fraudulent activities. However, there is usually a gap between the time that a user commits a fraudulent action and the time that the user is suspended by the platform. How to detect fraudsters in time is a challenging problem. Most of the existing approaches adopt classifiers to predict fraudsters given their activity sequences along time. The main drawback of classification models is that the prediction results between consecutive timestamps are often inconsistent. In this paper, we propose a survival analysis based fraud early detection model, SAFE, which maps dynamic user activities to survival probabilities that are guaranteed to be monotonically decreasing along time. SAFE adopts recurrent neural network (RNN) to handle user activity sequences and directly outputs hazard values at each timestamp, and then, survival probability derived from hazard values is deployed to achieve consistent predictions. Because we only observe the user suspended time instead of the fraudulent activity time in the training data, we revise the loss function of the regular survival model to achieve fraud early detection. Experimental results on two real world datasets demonstrate that SAFE outperforms both the survival analysis model and recurrent neural network model alone as well as state-of-the-art fraud early detection approaches.Comment: To appear in AAAI-201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

SaTC: CORE: Small: Deep Learning for Insider Threat Detection

Author: Yuan Shuhan
Publication venue: Hosted by Utah State University Libraries
Publication date: 19/03/2021
Field of study

DigitalCommons@USU

Spectrum-based deep neural networks for fraud detection

Author: Li Jun
Lu Aidong
Wu Xintao
Yuan Shuhan
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we focus on fraud detection on a signed graph with only a small set of labeled training data. We propose a novel framework that combines deep neural networks and spectral graph analysis. In particular, we use the node projection (called as spectral coordinate) in the low dimensional spectral space of the graph's adjacency matrix as input of deep neural networks. Spectral coordinates in the spectral space capture the most useful topology information of the network. Due to the small dimension of spectral coordinates (compared with the dimension of the adjacency matrix derived from a graph), training deep neural networks becomes feasible. We develop and evaluate two neural networks, deep autoencoder and convolutional neural network, in our fraud detection framework. Experimental results on a real signed graph show that our spectrum based deep neural networks are effective in fraud detection

arXiv.org e-Print Archive

ScholarWorks@UARK

UARK (University of Arkansas )

LogGPT: Log Anomaly Detection via GPT

Author: Han Xiao
Trabelsi Mohamed
Yuan Shuhan
Publication venue
Publication date: 25/09/2023
Field of study

Detecting system anomalies based on log data is important for ensuring the security and reliability of computer systems. Recently, deep learning models have been widely used for log anomaly detection. The core idea is to model the log sequences as natural language and adopt deep sequential models, such as LSTM or Transformer, to encode the normal patterns in log sequences via language modeling. However, there is a gap between language modeling and anomaly detection as the objective of training a sequential model via a language modeling loss is not directly related to anomaly detection. To fill up the gap, we propose LogGPT, a novel framework that employs GPT for log anomaly detection. LogGPT is first trained to predict the next log entry based on the preceding sequence. To further enhance the performance of LogGPT, we propose a novel reinforcement learning strategy to finetune the model specifically for the log anomaly detection task. The experimental results on three datasets show that LogGPT significantly outperforms existing state-of-the-art approaches

arXiv.org e-Print Archive

Ensino da língua portuguesa na China: uma análise de alguns planos curriculares

Author: Yuan Shuhan
Publication venue
Publication date: 24/06/2014
Field of study

Nos últimos anos, a China tem apostado fortemente no ensino do PLE com o objetivo de reforçar as relações comerciais com os países lusófonos. A fim de atender às necessidades de intercâmbio, tem havido um grande crescimento e expansão do curso de licenciatura em português nas instituições de ensino superior na China. O presente trabalho apresenta um estudo do plano curricular dos cursos de licenciatura em português na China Continental e tenta fazer uma análise de alguns planos curriculares desses cursos, em particular, o da Universidade de Estudos Internacionais de Xi’an em comparação com o da Universidade de Macau, tendo como objetivo acompanhar o desenvolvimento atual do ensino de português no contexto chinês e identificar os problemas existentes.In recent years, China has invested heavily in education PFL aiming to strengthen trade relations with portuguese-speaking countries. In order to meet the needs of exchange, more and more chinese universities have started undergraduate courses of Portuguese. The present work relates to the development of the curriculum of the bachelor courses in Portuguese, in Mainland China, and it tries to make an analysis of these courses curriculum, specially regarding the case of the Xi’an Internacional Study University and the University of Macao; the purpose of this work is the analysis of the current development of the teaching of Portuguese in the Chinese context and identifying the existing problems

Universidade de Lisboa: Repositório.UL

Robust Fraud Detection via Supervised Contrastive Learning

Author: S. Vinay M.
Wu Xintao
Yuan Shuhan
Publication venue
Publication date: 19/08/2023
Field of study

Deep learning models have recently become popular for detecting malicious user activity sessions in computing platforms. In many real-world scenarios, only a few labeled malicious and a large amount of normal sessions are available. These few labeled malicious sessions usually do not cover the entire diversity of all possible malicious sessions. In many scenarios, possible malicious sessions can be highly diverse. As a consequence, learned session representations of deep learning models can become ineffective in achieving a good generalization performance for unseen malicious sessions. To tackle this open-set fraud detection challenge, we propose a robust supervised contrastive learning based framework called ConRo, which specifically operates in the scenario where only a few malicious sessions having limited diversity is available. ConRo applies an effective data augmentation strategy to generate diverse potential malicious sessions. By employing these generated and available training set sessions, ConRo derives separable representations w.r.t open-set fraud detection task by leveraging supervised contrastive learning. We empirically evaluate our ConRo framework and other state-of-the-art baselines on benchmark datasets. Our ConRo framework demonstrates noticeable performance improvement over state-of-the-art baselines.Comment: 16 pages, 5 figures, and 3 table

arXiv.org e-Print Archive